Mercator as a web crawler

نویسنده

  • Priyanka - Saxena
چکیده

The Mercator describes, as a scalable, extensible web crawler written entirely in Java. In term of Scalable, web crawlers must be scalable and it is important component of many web services, but their design is not well-documented in the literature. In this paper, we enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercator’s support for extensibility and customizability. Finally, we comment on Mercator’s performance, which we have found to be more efficient and comparable to that of other craw-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler

This Paper described A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler. We enumerate the major components of any Scalable and Focused Web Crawler and describe the particular components used in this Novel Architecture. We also describe this Novel Architecture support for Extensibility and downloaded user’s support information. We also describe how the ...

متن کامل

Using High Performance Systems to Build Collections for a Digital Library

Nothing is more distributed than the Web, with its content spread across thousands of servers. High performance hardware and software is essential for an effective download, analysis, and organization of this content. We describe our experience with a highly parallel Web crawling system (Mercator) to construct – automatically – collections of scientific resources for the National Science Digita...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Implications of Web Mercator and Its Use in Online Mapping

Online interactive maps have become a popular means of communicating with spatial data. In most online mapping systems, Web Mercator has become the dominant projection. While the Mercator projection has a long history of discussion about its inappropriateness for general-purpose mapping, particularly at the global scale, and seems to have been virtually phased out for general-purpose global-sca...

متن کامل

Survey on – Self Adaptive Focused Crawler

A focused crawler may be described as a crawler which returns relevant web pages on a given topic in traversing the web. Web Crawlers are one of the most crucial part of the Search Engines to collect pages from the Web. The requirement of a web crawler that downloads most relevant web pages from such a large web is still a major challenge in the field of Information Retrieval Systems. Most Web ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012